A Tool for Arabic Documents Indexing and Retrieval From a Web Virtual Library
نویسندگان
چکیده
This paper presents a method for automatic indexing and retrieval of Arabic documents from a virtual library. This latter can be multilingual and encapsulates several documents written in different languages. All the documents are scanned in order to be stored in the library. The indexing method consists in using the document contents as indexes. They are firstly scanned and then submitted to an OCR software which provides document contents textual formats. In a second phase, the textual formats serve as input of a module which automatically translates the textual formats to html format (or XML). The different parts of the document contents become hyperlinks to the appropriate document scanned images. The end-user can then ask for downloading a postscript format of the document. This method was experimented for Latin documents, specifically for scientific reviews. This paper presents the method adaptation for Arabic reviews and other kinds of documents.
منابع مشابه
Arabic News Articles Classification Using Vectorized-Cosine Based on Seed Documents
Besides for its own merits, text classification (TC) has become a cornerstone in many applications. Work presented here is part of and a pre-requisite for a project we have overtaken to create a corpus for the Arabic text process. It is an attempt to create modules automatically that would help speed up the process of classification for any text categorization task. It also serves as a tool for...
متن کاملA Survey of Indexing and Retrieval of Multimodal Documents: Text and Images
A document conveys information using multiple modalities, including text, layout/style and images. For example, journal articles usually have figures to illustrate experimental results, and the title in a journal article usually has a different font size than the body text. Indexing and retrieval using only text is the traditional way of IR (Information Retrieval). With the development of the I...
متن کاملAssessing the Internal Structure of the Ellis Information Retrieval Model in Order to Present the Persian Norm of Web Retrieval Tools
Introduction: Study evaluated the internal structure of Ellis information seeking model in the student community with the aim of presenting the Persian norm. Methods: This is a descriptive-analytical study conducted by cross-sectional survey method in the second semester of the academic year 1399-1400. Population comprise of 280 graduate students at Ahvaz Jundishapur University of Medical Scien...
متن کاملThe effect of N-gram indexing on Arabic documents retrieval
This article presents a comparison between 3-gram and 4-gram term indexing in Arabic document retrieval. The calculation of similarity between query and documents is performed using single term and two term query, based on corpora of Arabic language documents collected from Arabic news websites available online.
متن کاملProgressive Discovery of Document Content
As the World-Wide Web, digital libraries and similar systems become ubiquitous, increasingly effective techniques are developed to help readers locate useful documents. Information retrieval techniques for indexing and querying documents provide for accurate matching of documents against user queries. Graphical and textual query interfaces allow users to more easily and effectively specify thei...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2001